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SUMMARY 

Use of the stochastic Galerkin finite element methods leads to large systems of linear equations obtained 
by the discretization of tensor product solution spaces along their spatial and stochastic dimensions. These 
systems are typically solved iteratively by a Krylov subspace method. We propose a preconditioner which 
takes an advantage of the recursive hierarchy in the structure of the global matrices. In particular, the matrices 
posses a recursive hierarchical two-by-two structure, with one of the submatrices block diagonal. Each 
one of the diagonal blocks in this submatrix is closely related to the deterministic mean-value problem, 
and the action of its inverse is in the implementation approximated by inner loops of Krylov iterations. 
Thus our hierarchical Schur complement preconditioner combines, on each level in the approximation of 
the hierarchical structure of the global matrix, the idea of Schur complement with loops for a number of 
mutually independent inner Krylov iterations, and several matrix-vector multiplications for the off-diagonal 
blocks. Neither the global matrix, nor the matrix of the preconditioner need to be formed explicitly. The 
ingredients include only the number of stiffness matrices from the truncated Karhunen-Loeve expansion 
and a good preconditioned for the mean- value deterministic problem. We provide a condition number bound 
for a model elliptic problem and the performance of the method is illustrated by numerical experiments. 
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1. INTRODUCTION 

A set-up of mathematical models requires information about input data. When using partial 
differential equations (PDEs), the exact values of boundary and initial conditions along with 
the equation coefficients are often not known exactly and instead they need to be treated 
with uncertainty. In this study we consider the coefficients as random parameters. The most 
straightforward technique of solution is the famous Monte Carlo method. More advanced 
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techniques, which have became quite popular recently, include stochastic finite element methods. 
There are two main variants of stochastic finite elements: collocation methods [1,2] and stochastic 
Galerkin methods [3, 4, 5]. Both methods are defined using tensor product spaces for the spatial and 
stochastic discretizations. Collocation methods sample the stochastic PDE at a set of collocation 
points, which yields a set of mutually independent deterministic problems. Because one can use 
existing software to solve this set of problems, collocation methods are often referred to as non- 
intrusive. However, the number of collocation points can be quite prohibitive when high accuracy is 
required or when the stochastic problem is described by a large number of random variables. 

On the other hand, the stochastic Galerkin method is intrusive. It uses the spectral finite element 
approach to transform a stochastic PDE into a coupled set of deterministic PDEs, and because of 
this coupling, specialized solvers are required. The design of iterative solvers for systems of hnear 
algebraic equations obtained from discretizations by stochastic Galerkin finite element methods 
has received significant attention recently. It is well known that suitable preconditioning can 
significantly improve convergence of Krylov subspace iterative methods. Among the most simple, 
yet quite powerful methods, belongs the mean-based preconditioner by Powell and Elman [6], cf. 
also [7]. Further improvements include, e.g., the Kronecker product preconditioner by UUmann [8]. 
We refer to Rosseel and Vandewalle [9] for a more complete overview and comparison of various 
iterative methods and preconditioners, including matrix splitting and multigrid techniques. Also, an 
interesting approach to solver parallelization can be found in the work of Keese and Matthies [10]. 

Schur complements are historically well known from substructuring and, in particular, from the 
iterative substructuring class of the domain decomposition methods cf., e.g., monographs [11, 12]. 
However they have also shown to posses interesting mathematical properties, and they have been 
studied independently [13, 14]. The basic idea is to partition the problem and reorder its matrix 
representation such that a direct elimination of a part of the problem becomes straightforward. 
This reordering can be also performed recursively, which leads to the recursive Schur complement 
methods [15, 16, 17]. The multilevel Schur complement preconditioning in multigrid framework can 
be, to the best of our knowledge, traced back to Axelsson and Vassilevski [18, 19]. The Algebraic 
Recursive Multilevel Solver (ARMS) by Saad and Suchomel [20] and its parallel version (pARMS) 
by Li et al. [21] use variants of incomplete LU decompositions, and they are also closely related 
to the Hierarchical Iterative Parallel Solver (HIPS) by Gaidamour and Henon [22]. We also note 
that a remarkable idea for preconditioning non-symmetric systems using an approximate Schur 
complement has been proposed by Murphy, Golub and Wathen [23]. 

In this paper, we propose a symmetric preconditioner which takes advantage of the recursive 
hierarchy in the structure of the global system matrices. This structure is obtained directly from 
the stochastic formulation. In particular, the matrices posses a recursive hierarchical two-by-two 
structure, cf. [24, 25], where one of the submatrices is block diagonal and therefore its inverse can 
be computed by inverting each of the blocks independently. Moreover, each of the diagonal blocks 
is closely related to the deterministic mean-value problem. In fact, the diagonal blocks are obtained 
simply by rescaling the mean- value matrix in the case of linear Karhunen-Loeve expansion. So, 
assuming that we have a good preconditioner for the mean available, each block can be solved 
iteratively by an inner loop of Krylov iterations. Doing so, our hierarchical Schur complement 
preconditioner becomes variable because it combines, on each level in the approximation of the 
hierarchical structure of the global matrix, the idea of the Schur complement with loops for a number 
of mutually independent inner Krylov iterations, and several matrix-vector multiplications for the 
off-diagonal blocks. Due to variable preconditioning one has to make a careful choice of Krylov 
subspace methods, and their variants such as flexible conjugate gradients [26], FGMRES [27], or 
GMRESR [28] are preferred. However, in our numerical experiments, we have obtained the same 
convergence with the flexible and the standard versions of conjugate gradients. It is important to 
note that neither the global matrix, nor the preconditioner need to be formed explicitly, and we can 
use the so called MAT-VEC operations from [25] in both matrix- vector multiplications: by a global 
system matrix in the loop of outer iterations and in the action of the preconditioner. The ingredients 
of our method thus include only the number of stiffness matrices from the truncated Karhunen- 
Loeve expansion and a good preconditioner for the mean-value deterministic problem. Therefore 
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the method can be regarded as minimally intrusive because it can be built as a wrapper around 
an existing solver for the corresponding mean-value problem. Nevertheless in this contribution we 
neither address the parallelization nor the choice of the preconditioner for the mean- value problem. 
These two topics would not change the convergence in terms of outer iterations, and they will be 
studied elsewhere. 

The paper is organized as follows. In Section 2 we introduce the model problem, in Section 3 
we discuss the structure of the stochastic matrices, in Section 4 we formulate the hierarchical Schur 
complement preconditioner and provide a condition number bound under suitable assumptions, in 
Section 5 we outline possible variants of the method and provide details of our implementation, and 
finally, in Section 6 we illustrate the performance of the algorithm by numerical experiments, and 
in Section 7 we provide a short summary and a conclusion of the work presented in this paper. 



2. MODEL PROBLEM AND ITS DISCRETIZATION 



Let I? be a domain in M'*, d = 2, and let {fl, T , [i) be a complete probability space, where is the 
sample space, T is the ct— algebra generated by and ^ : J" — > [0, 1] is the probability measure. We 
are interested in a solution of the following elliptic boundary value problem: find a random function 
u (x, a;) : _D X M which almost surely (a.s.) satisfies the equation 



~V • (k (x, cj) Vu (x, bS)) ~ j {x) in _D X il, 
u {x, uj) ^ on dD x fl, 



(1) 
(2) 



where f £ L'^ {D), and k {x, a;) is a random scalar field with a probability density function dfi (oj) . 
We note that the gradient symbol V denotes the differentiation with respect to the spatial variables. 
Also, we will assume that there exist two constants < fcmin < fcmax such that 

/i (a; G : k^j^ < k {x, w) < fcmax ^x E =1. 

In the weak formulation of problem (l)-(2), we would like to solve 

ueU : a{u,v) = (/, v) , Vu G U. (3) 

Here f & U' with U' denoting the dual of U and (•, •) the duality pairing. The space U and its norm 
are defined, using a tensor product and expectation E with respect to the measure ^, as 



U = Hi (D) ^ Llin) , \\u\\^ 
The bilinear form a and right-hand side are 



iVwl dx 



UD 



a (u, v) 



k (x, uj) Vu • Vw dx 



Ud 



f vdx 



Ud 



Next, let us define the stochastic operator : [/ — > J7' by 

a {u, v) = {K^jU, v) , Vit; V <eU. 
So the problem (3) can be now equivalently written as the stochastic operator equation 

{K^ u, v) = (/, v), yve U. 



(4) 



(5) 



The operator is stochastic via the random parameter k{x,uj). Assuming that its covariance 
fvmction C (xi, X2) is known, we will further assume that it has the linear Karhunen-Loeve (KL) 
expansion truncated after N terms as 



N 



fc(x,w) =^fc,(a;)C,(w), Co^l, ~ U [0, 1] i^l 



,...,iV, 



(6) 
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such that (oj), i > are identically distributed, independent random variables. Here kg is the mean 
of the random field, and ki {x) — y/Xlvi (x) where (Ai, Vi {x))^y^ are the solutions of the integral 
eigenvalue problem 



C {xi,X2) Vi {X2) dx2 = XiVi (xi) , 



(7) 



see [5] for details. For the numerical experiments in this paper, we made a specific choice 

C{xi,X2) = cr^exp (- \\xi - X2II1 /L) , (8) 

with cr^ denoting the variance, and L the correlation length of the random variables {uj). Efficient 
computational methods for solution of the eigenvalue problem (7) are described, e.g., in [29]. 
Using the KL expansion of /c in the definition of the operator in (4), we obtain 



N 



{K^u, = ( ^ £,i (^) h (x) u {x, LO) , V {x, Uj) 



(9) 



i j=0 



Remark 1 

More generally than (6), we can consider the generahzed polynomial chaos (gPC) expansion of k 
as 

M' 



In both cases, we write 



k [x, Cj) = ^ fci {x) ipi (^(w)) . 
i=0 

L 

k {x, w) ^ fci {x) ipi (C(w)) , 



i=0 



ioi L = N in the KL case and L ~ M' in the gPC case. 

We will consider discrete approximations to the solution to (5) given by finite element 
discretizations of [D) and generalized polynomial chaos (gPC) discretizations of (il), 
namely 

Ndaf M 

u = ^ ^ u^j(j)^{x)ipj (^0, ■ ■ ■ , Cat) , (10) 

where {(t>i{x)}fji^ are suitable finite element basis functions, the gPC basis {ipj{£,)}jLo obtained 
as the the tensor product of Legendre polynomials of total order at most P and ^ — {^q, . . . , ^n). The 
choice of Legendre polynomials is motivated by the fact that these are orthogonal with respect to 
the probability measure associated with the uniform random variables ^a, ■ ■ ■ ,^n- The total number 
of gPC polynomials is thus M +1 = %^p^' , cf. also [5, p. 87]. 

Substituting the expansions (9) and (10) into (5) yields a deterministic linear system of equations 

M L 

^^c,,fei^,u, =/fe, k = 0,...,M, (11) 

j=0 1=0 

where {fk)i = E[J^ f (x) cjji (x) dx], {Ki)^.^^^ j^h{x)4)i{x)4)„i{x) dx, and the coefficients 
djk — E [ipiipji^k] ■ Each one of the blocks Ki is thus a deterministic stiffness matrix given by ki (x), 
cf. (9), of size (N^of x N^of), where N^of is the number of spatial degrees of freedom. The 
system (11) is then given by a global matrix of size ((A/ + 1) Ndof x (A/ + 1) Ndof), consisting 
of Ndof X Ndof blocks K^J-'^), and it can be written as 

^(0,0) ^(0,1) . . . 



^(M,0) J^(M.l) 



j^iM.M) 





Uq 




" /o " 




Uk 




fk 








Jm 



(12) 
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where each of the blocks K^^''^^ is in the KL case obtained as 

N 

i^O.'^) =^c„fei^,. (13) 

Remark 2 

With an iterative solution of (12) in mind, one needs to store only the constants c^fc, the blocks Ki 
and use the formula (13) for matrix-vector multiplication, see MAT-VEC operations in [25]. 

It is important to note that the first diagonal block is obtained by the 0— th order polynomial chaos 
expansion and therefore it corresponds to the deterministic problem obtained using the mean value 
of the coefficient k, in particular 

i^(O'O) - K„. 

The sparsity structure of the matrix in (12) will in general depend on the type of the gPC 
polynomial basis, on the number of terms retained in the expansions (9) and (10), and also on the 
number of stochastic dimensions. Nevertheless, due to the orthogonality of the gPC basis functions, 
the constants Cijk will vanish for many combinations of the indices i, j, and k. The block sparsity 
structure of the global stochastic Galerkin matrix in (12), with the blocks given by (13), will depend 
on a matrix cp with entries c^^ ^^^ — J2iLo ^ijk, where j, fc = 0, . . . , M. The typical structure of cp 
is illustrated by Figure 1. Looking carefully at the figures, we can observe a block hierarchical 
structure of the matrices. In the next section, we will study this structure in somewhat more detail. 




(a) Af = 4, P = 4 gives 350 blocks (b) = 4, P = 7 gives 2010 blocks 

Figure 1 . Hierarchical structure of the matrix c p which determines the block sparsity of the global stochastic 
Galerkin matrix with A*" = 4 stochastic dimensions using (a) P = 4, or (b) P = 7 order of polynomial 
expansion. The sub-blocks correspond to the polynomials of order (a) P = 1, 2, 3 and (b) P — 4, 5, 6. 



3. STRUCTURE OF THE MODEL MATRICES 

Let us begin by an illustration. Figure 1(a) shows the structure of the stochastic Galerkin matrix 
based on the fourth order polynomial chaos expansion in four stochastic dimensions. The schematic 
matrix in the picture is cp (here P = 4), so in the global stochastic Galerkin matrix as it is written 
in eq. (12) each tile corresponds to a block of a stiffness matrix with the same sparsity pattern as 
the original finite element problem. Now, let us denote the corresponding global Galerkin matrix 
by A4 , and by ^3, B4, C4 and its four submatrices, cf. (15). We see that is block diagonal 
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and the structure of A3 resembles the structure of A4 and this hierarchy is repeated all the way to 
the 1x1 block Aq and a block diagonal matrix Di. The number in the subscript indicates that the 
entries in the block correspond to the polynomial expansion in the case of (a) A3 of order three or 
less, and (b) B3, C3 and D3 of order three. Clearly, the sparsity and hierarchical structure follows 
from orthogonality of the polynomials as was pointed out in [25]. More specifically, let us consider 



a two-by-two block structure of a (square) coefficient matrix cp with dimensions 



{N+py. 

N\P\ 



as 



Cp 



cp-i 
bp 



bl 



where cp-i is the first principal submatrix with dimensions 



(N+P-l)\ 



N\(p-r)\' ^^'^ remaining blocks 
are defined accordingly. Generally, let us consider a recursive hierarchy in the spliting of cp as 



bp 



bJ 
dp 



where the dimensions of cp are given by 



{N+ey 

NW. ' 



the dimensions of the first principal 



submatrices cp-i are given by ^^\(^p_l)\ , and the remaining blocks are defined accordingly. We note 
that even though the matrices cp are symmetric, the stochastic Galerkin matrix will be symmetric 
only if each one of the matrices if, is itself symmetric. We refer, e.g., to [9, 30] for further details 
and discussion, and state here only the essential observation for our approach: 

Lemma 3 ([9, Corollary 2.6]) 

The block dp is a diagonal matrix for alH = 1, . . . , P. 



The global problem (12) can be equivalently written as 

Apup ^ /p, 
with the matrix Ap having a hierarchical structure 



Ap 



Ap-i 
Cp 



Bp 
Dp 



(14) 



(15) 



where the subscript t stands for the blocks obtained by an approximation by the ^— th degree 
stochastic polynomial (or lower), and all of the blocks Dp are block diagonal. In particular the 
smallest case is given by the finite element approximation with the mean values of the coefficients, 
and therefore the mean-value problem is 



(16) 



and in particular Aq = Kq. In this paper, we will assume that the inverse of Aq is known, or at least 
that we have a good preconditioner Mq readily available. 

Remark 4 

Clearly, if all of the matrices Ki are symmetric, the global matrix A p and all of its submatrices Ap 
will be symmetric as well, i.e.. 



Ap^ 



Ap-i Bp 
Bj Dp 



A- 



However, for the sake of generality, we will use the non-symmetric notation (15). We note that a 
question under what conditions is the global problem positive definite is far more delicate, in general 
depends on the type of the polynomial expansion and also on the choice of the covariance function. 

In the next section we introduce our preconditioner, taking advantage of the hierarchical structure 
and of the fact that the matrices Dp, where ^ = P, . . . , 1, are block diagonal. 
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4. SCHUR COMPLEMENT PRECONDITIONER 
Let us find an inverse of a general block matrix given as 



A B 
C D 



(17) 



assuming that we can easily compute the inverse of D. By block LU decomposition, we can derive 

(18) 



" A 


B ' 




C 


D 





Ia BD-^ 
Id 



S 
D 



Ia 
D-^C I 



D 



where S = A — BD is the Schur complement of Z) in (17). Inverting the three blocks, we obtain 



" A 


B ' 


-1 


C 


D 





Ia 

-D-^C Id 











I A -BD-^ 
Id 



(19) 



The hierarchical Schur complement preconditioner is based on the block inverse (19). In the 
action of the preconditioner, application of the three blocks on the right-hand side of (19) will be 
called (in the order in which they are performed) as pre-correction, correction and post-correction. 

So, in the action of the preconditioner we would like to approximate problem (14) which with 
respect to (15) can be written as 



(20) 



Ap-i Bp 








\ 1 


Cp Dp 




Up 







The matrix inverse can be with respect to (19) written as 



Ap-i 
Cp 



Bp 
Dp 



Ia 



-D-^Cp 





Id 



Dp' 



Ia 




-BpDp^ 



where 



Sp-i — Ap^i — BpDp^Cf 



Because computing (and inverting) the Schur complement Sp-i explicitly is computationally 
prohibitive, we suggest to replace the inverse of Sp-i by the inverse of Ap^i. Since Ap^i has 
the hierarchical structure as described by (15), i.e.. 



Ap-i = 



Ap-2 Bp-i 
Cp-i -Dp-i 



we can approximate its inverse again using the idea of (19) and so on. Eventually, we arrive at 
the Schur complement of the mean-value problem Sq which we replace by Aq. Thus the action of 
this hierarchical preconditioner M p consists of a number of pre-correction steps performed on the 
levels £ = P, . . . , 1, solving the "mean- value" problem with Aq on the lowest level, and performing 
a number of the post-processing steps sweeping up the levels. We now formulate the preconditioner 
for the iterative solution of the global problem (14) more concisely as: 

Algorithm 5 (Hierarchical Schur complement preconditioner) 
The preconditioner Mp : rp \ — > up is defined as follows: 
for £ = P, ... 1, 

split the residual, based on the hierarchical structure of matrices, as 

ri 
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re-i 



compute the pre-correction as 

9 

lf£>l, set 

Else (if i — 1), solve the system ^o^o — go- 
end 

tor£^l,...P, 
compute the post-correction, i.e., set ui^^ — solve 



Di'{4-C,4-') 



and concatenate 



Ui 



/-I 



If£ < P, set^ 



end 



We will now restrict our considerations to the case when all of the matrices Ai, £ — P, . . . ,1 aie 
symmetric, positive definite. In this case, the decomposition (18) can be written for all levels £ as 



A,, = 



Ai-i 

bT 



Be 
Dp 



I A BpDJ^ 
Id 



Se-i 
Dp 



Ia 







DV'Bi I 



D 



Because all of the matrices Ai, £ = P, . . . ,1 are positive definite, the above becomes a set of 
congruence transformations and by the Sylvester law of inertia, all of the Schur complements Sp, 
i = P — 1, . . . ,0 are also symmetric positive definite. Thus, we can establish for appropriate 
vectors u the next set of inequalities. 



0, 



,p-l, 



(21) 



where = u'^Au denotes the energy norm, and use it in the following: 
Theorem 6 

For the symmetric, positive definite matrix Ap the preconditioner Mp defined by Algorithm 5 is 
also positive definite, and the condition number k of the preconditioned system is bounded by 



Amax {MpAp) 

[MpAp) 



where C = IT 



P-l "^^,2 



Proof 

The bound follows directly from the sequential replacement of the Schur complement operators Se 
by the hierarchical matrices Ap in Algorithm 5, and the bounds in the equivalence (21). □ 

Hence, the convergence rate can be established from the spectral equivalence (21). 
Remark 7 

Despite the multiplicative growth of the condition number bound as predicted by Theorem 6 
from our numerical experiments (Table II) it appears that, at least in the case of uniform random 
variables and Legendre polynomials, the ratio of the constants in (21) is close to one and hence the 
convergence of conjugate gradients is not as pessimistic as predicted by the bound. 



In the next section, we discuss several modifications of the method and the preconditioner. 
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5. VARIANTS AND IMPLEMENTATION REMARKS 

Clearly, there are many other ways of setting up a hierarchical preconditioner. These possibilities 
follow by considering the block inverse (19) and writing it in a more general form, which can be 
subsequently used in the approximation of the preconditioner from Algorithm 5, as 



M = 



Ia 
^MlC Id 



Ms 
Af2 



Ia -BM}, 
Id 



(22) 



so that M}j , z = 1, 2, 3, approximate D^^ and Ms approximates S^^. Our main approximation in 
Algorithm 5 is in using the hierarchy of matrices Ag, i = P — 1, . . . ,0 m place of Ms on each level. 
Next, in our case D is block-diagonal. Thus computing its inverse means solving independently 
a number of systems, where each one of them has the same size (and sparsity structure) as the 
deterministic problem for the mean. In fact, the diagonal blocks are just scalar multiples of the 
"mean-value" matrix Ko{— Aq). In our implementation, we have replaced the exact solves of D 
by independent loops of preconditioned Krylov subspace iterations for each diagonal block of D 
using the mean-value preconditioner Mq. In the numerical experiments we have tested convergence 
with the following choices of Mq: no preconditioner, simple diagonal preconditioner, and the exact 
LU decomposition of the block Aq (which converges in one iteration). So this variant of the 
hierarchical Schur complement preconditioner involves multiple loops of inner iterations and thus 
possibly changes in every outer iteration. In order to accommodate such variable preconditioner, 
it is generally recommended to use a flexible Krylov subspace method such as flexible CG [26], 
FGMRES [27], or GMRESR [28]. Nevertheless, we have observed essentially the same convergence 
in terms of outer iterations with both variants of the conjugate gradients, the flexible and the 
standard one as well. The convergence seems also to be independent of the choice of Mq and in 
this contribution we do not advocate any specific choice. Next, one can in general replace the action 
of any M}j, i — 1,2,3, by the action of just Mq itself. However it is well-known from iterative 
substructuring cf., e.g., [11, Section 4.4], that even if Md is spectrally equivalent to D^^, the 
resulting preconditioner might not be spectrally equivalent to the original problem. 

It also appears that one can modify not only the preconditioner, but also the set up of the method 
itself. Namely, inspired by the iterative substructuring cf., e.g., [12], one can reduce the system given 
by Ap to the system given by the Schur complement Sp-i used subsequently in the iterations. So, 
in the first step, cf. (20), we eliminate Up and define up-i = Up^^, which yields 

Sp-iup-i = gp-i, (23) 

where 

Sp^,^Ap_,-BpDp'Cp, and 9p-i = fj^^' - BpDp' fj^ . 
After convergence, the variables Up are recovered from 

"P = Dp' - Cpup_i) . 

There are two advantages of the a-priori elimination of the second block: first, because the 
system (23) will be solved iteratively, the iterations can be performed on a much smaller system 
and also, at least for symmetric, positive definite problems, the condition number of the Schur 
complement cannot be higher than the one of the original problem [11] even if one uses a 
diagonal preconditioning [31]. The preconditioner Mp^i for the system (23) is then the same as in 
Algorithm 5 except that the for-loops are performed only for all levels I = 1, . . . ,P — 1. However, 
this reduction is theoretically justified only when exact solves for the block diagonal matrix Dp 
are available. In general, if one uses only approximate solves, e.g., by performing inner/outer 
Krylov iterations for Dp and S'p_i respectively, the global system matrix becomes variable as well, 
this might lead to the loss of orthogonality and poor performance of the method. Our numerical 
experiments indicated that the preconditioned iterations for Ap and S'p-i perform identically, but 
we do not advocate to use a-priori reduction to the Schur complement in general. 
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Figure 2. The 15 dominant eigenvalues with the covariance kernel (8) (in this plot a = 1). 



6. NUMERICAL EXAMPLES 

We have implemented the stochastic Galerkin finite element method for the model elliptic 
problem (l)-(2) on a square domain [0, 1] x [0, 1] uniformly discretized by 10 x 10 Lagrangean 
bilinear finite elements. The mean value of the coefficient k was set to fco = 1- The coefficients 
in the covariance function C defined by eq. (8) were set to L = 0.5 and a = 0.5, so the 
coefficient of variation is given as CoV = a/ko — a — 50%. The 15 dominant eigenvalues of 
the discretized eigenvalue problem (7) are shown in Figure 2. We have studied convergence of 
the flexible version of the conjugate gradient method (FCG) without a preconditioner, with a 
global mean-based preconditioner by Powell and Elman [6], with the block symmetric Gauss- 
Seidel preconditioner A/bGS (with zero initial guess) and with the hierarchical Schur complement 
preconditioner Mhs- The convergence results are summarized in Tables 1-lV. We have observed 
essentially the same convergence of the standard conjugate gradients compared to the flexible 
version, which is reported in the tables. Also, in our experience, the convergence rates were 
independent of the choice of the mean-value preconditioner Mq (no preconditioner, diagonal 
preconditioner and the LU-decomposition of the "mean-value" block Aa) used in inner iterations of 
the preconditioner for the diagonal block solves with the same relative residual tolerance as in the 
outer iterations. From Tables 1 and 11 it appears that the convergence depends only mildly on the 
stochastic dimension N and the order of polynomial expansion P, respectively. Table 111 indicates 
a modest dependence on the value of the standard deviation a, and finally Table IV indicates that 
the convergence is independent of the mesh size h. We note that for CoV > 55% the problem is no 
longer guaranteed to be elliptic, and the global matrix A is not positive definite. 

Table V summarizes the block count in the structure of the global Galerkin matrix A obtained 
using the KL expansion, cf. Figure 1, when either of the parameters N or P changes and the other 
one is set to be equal to four. The two choices lead to slightly different block sparsity structures of A, 
however the numbers of blocks are the same. Let us denote by ni, the total number of blocks in A and 
by Udb the number of its diagonal blocks. Note that one application of the mean-based preconditioner 
requires Udb solves of the diagonal blocks. The columns three and four in Table V contain the 
numbers of block matrix-vector multiplications n„i and block diagonal solves Uds performed in one 
action of the hierarchical Schur preconditioner. From Algorithm 5 we obtain that 

nm = nb - ridb, 

where half of multiplications is performed in the first for-loop and the other half in the second, and 

Uds = 2{ndb - 1) + 1, 
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Table I. Convergence of (flexible) conjugate gradients for the full system matrix A, for A preconditioned 
by the mean-based preconditioner Mm, by the block Gauss-Seidel preconditioner A/bcS' ™d by the 
hierarchical Schur complement preconditioner Mug. The coefficient of variation of the uniform random 
field is CoV = 50%, polynomial degree is P = 4, and the stochastic dimension is variable. Here, ndof 
is the dimension of A, iter is the number of iterations with the relative residual tolerance 10^^, and k is the 
condition number estimate from the Lanczos sequence in conjugate gradients. 



setup 


A 


MmA 


MbGsA 


Mas A 


N 


ndof 


iter 




iter 


K 


iter 


K 


iter 




1 


605 


173 


1965.4 


12 


2.0127 


5 


1.0507 


5 


1.0465 


2 


1815 


531 


5333.3 


15 


2.7340 


6 


1.1279 


6 


1.1236 


3 


4235 


745 


9876.9 


16 


2.9995 


7 


1.1693 


6 


1.1514 


4 


8470 


902 


17,150.2 


17 


3.3413 


7 


1.2131 


7 


1.2028 


5 


15,246 


1033 


17,275.8 


18 


3.5891 


7 


1.2447 


7 


1.2434 


6 


25,410 


1037 


17,333.5 


18 


3.6349 


7 


1.2501 


7 


1.2559 


7 


39,930 


1040 


17,348.9 


19 


4.0993 


8 


1.3202 


7 


1.3146 


8 


59,895 


1081 


17,360.6 


19 


4.0597 


8 


1.3198 


7 


1.3182 



Table 11. Convergence of (flexible) conjugate gradients for the full system matrix A, for A preconditioned 
by the mean-based preconditioner Mm, by the block Gauss-Seidel preconditioner MbcS' by the 
hierarchical Schur complement preconditioner A/hs- The stochastic dimension is A'' = 4, CoV = 50%, and 
the polynomial degree P is variable. The other headings are same as in Table 1. 



setup 


A 


MmA 


MbGS^ 


MhsA 


P 


ndof 


iter 


K 


iter 


K 


iter 


K 


iter 




1 


605 


134 


625.6 


9 


1.6391 


5 


1.0626 


5 


1.0624 


2 


1815 


315 


1903.2 


13 


2.2379 


6 


1.1117 


6 


1.1109 


3 


4235 


586 


5721.1 


15 


2.8122 


7 


1.1658 


6 


1.1559 


4 


8470 


902 


17,150.2 


17 


3.3413 


7 


1.2131 


7 


1.2028 


5 


15,246 


1402 


29,751.0 


18 


3.7824 


7 


1.2538 


7 


1.2426 


6 


25,410 


1943 


49,842.4 


19 


4.1534 


8 


1.2921 


7 


1.2798 


7 


39,930 


2568 


83,056.6 


20 


4.4708 


8 


1.3219 


7 


1.3125 


8 


59,895 


3267 


136,419.0 


20 


4.7371 


8 


1.3472 


7 


1.3398 



Table 111. Convergence of (flexible) conjugate gradients for the full system matrix A, its first Schur 
complement S, for A preconditioned by the global mean-based preconditioner Mm, by the block Gauss- 
Seidel preconditioner A/^gS' ^nd by the hierarchical Schur complement preconditioner A/jjs- Here, the size 
of A is 8470 ndof, the stochastic dimension is TV = 4, the polynomial degree is P = 4, the mean is fco — 1, 
and the coefficient of variation CoV is variable. The other headings are same as in Table I. 



setup 


A 


MmA 


MbGS^ 


Mas A 


CoV{%) 


iter 


K 


iter 




iter 


K 


iter 


K 


5 


694 


15,556.3 


6 


1.0960 


3 


1.0008 


3 


1.0009 


15 


739 


15,673.2 


9 


1.3514 


4 


1.0090 


4 


1.0089 


25 


804 


15,912.5 


11 


1.7021 


5 


1.0314 


5 


1.0304 


35 


833 


16,286.1 


13 


2.1808 


6 


1.0770 


5 


1.0664 


45 


877 


16,815.9 


16 


2.8773 


6 


1.1510 


6 


1.1414 


55 


926 


17,539.6 


19 


3.9523 


8 


1.2948 


7 


1.2830 



which follows from the two for- loops and one solve of the first block Aq. Hence one action of 
the hierarchical Schur preconditioner requires nearly the same number of computations as one 
global Galerkin matrix- vector multiplications, n„i w n^, and two applications of the mean-based 
preconditioner, nds ~ "^^ndt- It is important to note that whereas the application of the mean-based 
preconditioner can be performed fully in parallel, the two for-loops in Algorithm 5 are sequential, 
and thus the eventual parallelization can be performed only within each step of these for-loops. The 
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Table IV. Convergence of (flexible) conjugate gradients for the full system matrix A, for A preconditioned 
by the global mean-based preconditioner Afm, by the block Gauss-Seidel preconditioner MbcS' ^nd by 
the hierarchical Schur complement preconditioner A/hs- Here, the stochastic dimension is A*' = 4, the 
polynomial degree is P = 4, the mean is ko = 1, the coefficient of variation is CoV — 50%, and the size of 
the finite element mesh h is variable. The other headings are same as in Table I. 



setup 


A 




MbGS^ 


MhsA 


h ndof 


iter 


K 


iter 




iter 


K 


iter 


K 


1/5 2520 


404 


4847.5 


16 


3.2484 


7 


1.2022 


6 


1.1790 


1/10 8470 


902 


17,150.2 


17 


3.3413 


7 


1.2131 


7 


1.2028 


1/15 17,920 


1386 


36,716.6 


17 


3.3145 


7 


1.2063 


7 


1.2047 


1/20 30,870 


1883 


63,535.2 


17 


3.3463 


7 


1.2110 


7 


1.2032 


1/25 47,320 


2383 


97,605.6 


17 


3.3473 


7 


1.2112 


7 


1.2032 


1/30 67,270 


2872 


138,929.0 


17 


3.3190 


7 


1.2070 


7 


1.2054 



Table V. Numbers of blocks in the full system matrix and the "work-count" in the application of the 
preconditioner M, when one of the parameters A*' or P is changing and the other one is set to 4, cf. Figure 1. 
Here n(, is the total number of blocks, is the number of diagonal blocks, which is the same as the number 
of solves in the application of the mean-based preconditioner Mm, rim is the number of block matrix-vector 
multiplications in the action of the preconditioner M, and n^s is the number of its block diagonal solves. 



iVorP 


rih 


ndb 




nds 


1 


13 


5 


8 


9 


2 


55 


15 


40 


29 


3 


155 


35 


120 


69 


4 


350 


70 


280 


139 


5 


686 


126 


560 


251 


6 


1218 


210 


1008 


419 


7 


2010 


330 


1680 


659 


8 


3135 


495 


2640 


989 



work count of iV/bcs. which is block sequential, is given by 2ndb diagonal solves, and 1.5 (or 2, if 
the initial guess of GS is nonzero) times of block matrix-vector multiplications compared to Mhs- 

In the second set of experiments, we have tested convergence of the preconditioner with the 
same physical domain and parameter setting, except assuming that the random coefficient k has 
lognormal distribution with the coefficient of variation being set to CoV = txiog/^iog = 100%. We 
note that in order to guarantee existence and uniqueness of the solution, we have used twice the 
order of polynomial expansion of the coefficient k than of the solution, cf. [32]. Such discretization 
is done within the gPC framework, see Remark 1, using Hermite polynomials [33], and leads 
to a fully block dense structure of the global Galerkin matrix A. Therefore the solves involving 
submatrices Di, £ = 1, . . . , P, in the pre- and post-correction steps are no longer block diagonal. 
Our numerical tests using both, direct and iterative solves with the De, and using the same tolerance 
as for the outer iterations, lead to the same count of outer iterations. The performance results are 
summarized in Tables VI-IX. The convergence rate reported in Table VI indicates a mild dependence 
on the stochastic dimension N, Table VII indicates a modest dependence on the order of the 
polynomial expansion P, and Table VIII indicates also a modest dependence on the coefficient of 
variation CoV. From Table IX we see that the convergence is nearly independent of the mesh size h. 
The performance of both preconditioners A/bcs and A/hs is significantlly better compared to the 
mean-based preconditioner Afm. Also, we see that A/hs performs a bit better than A/bGS- However, 
we must note that A/hs is also more computationally intensive because it requires solves with larger 
diagonal submatrices Di, for all levels £ — 1, . . . , P, and a work count comparison with A/bcs is 
not straightforward. As before, the two for-loops corresponding to Algorithm 5 are sequential, and 
thus the eventual paralelisation can be performed only within each step in the for-loop. 

The numerical experiments presented here were implemented using a sequential code in Mat lab, 
version 7.12.0.635 (R2011a), and therefore we do not report on computational times. 
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Table VI. Convergence of (flexible) conjugate gradients for the full system matrix A obtained by the 
gPC expansion of the lognormal field, for A preconditioned by the mean-based preconditioner Mm, by 
the block Gauss-Seidel preconditioner AfbcS' ™d by the hierarchical Schur complement preconditioner 
Mfjs- Polynomial degree is fixed to P = 4, the coefficient of variation of the lognormal random field is 
CoV = 100%, and the stochastic dimension A'' is variable. The other headings are same as in Table I. 



setup 


A 


MmA 


MbGS^ 


MhsA 


N ndof 


iter K 


iter K 


iter K, 


iter K 


1 605 

2 1815 

3 4235 

4 8470 


585 51,376.4 
1396 58,718.8 
1770 69,054.8 
2016 70,143.6 


48 28.7589 

61 37.1593 

62 38.0715 
66 43.6525 


15 3.4192 
17 3.7490 
17 3.7380 
19 4.2935 


15 3.4000 

16 3.6244 
16 3.7632 
16 4.1669 



Table VII. Convergence of (flexible) conjugate gradients for the full system matrix A obtained by the 
gPC expansion of the lognormal field, for A preconditioned by the mean-based preconditioner A/m, by 
the block Gauss-Seidel preconditioner AfbcS' ™d by the hierarchical Schur complement preconditioner 
A/hs • Stochastic dimension is fixed to A'' = 4, the coefficient of variation of the lognormal random field is 
CoV — 100%, and the polynomial degree P is variable. The other headings are same as in Table I. 



setup 


A 


MmA 


A4gs^ 


MusA 


P 


ndof 


iter 




iter 




iter 


K 


iter 




1 


605 


134 


578.2 


15 


3.4954 


8 


1.3910 


1 


1.3856 


2 


1815 


329 


2027.3 


28 


8.9450 


12 


1.9742 


10 


1.9289 


3 


4235 


804 


10,048.4 


44 


20.0366 


15 


2.8670 


13 


2.7955 


4 


8470 


2016 


70,143.6 


66 


43.6525 


19 


4.2935 


16 


4.1669 



Table VIII. Convergence of (flexible) conjugate gradients for the full system matrix A obtained by the gPC 
expansion of the lognormal field, for A preconditioned by the mean-based preconditioner Mm, by the block 
Gauss-Seidel preconditioner Afbcs, and by the hierarchical Schur complement preconditioner A/jjs- Here, 
the size of A is 8470 ndof, the stochastic dimension is = 4, the polynomial degree is P = 4, and the 
coefficient of variation of the lognormal field CoV is variable. The other headings are same as in Table I. 



setup 


A 


MmA 


AfbGS^ 


M 




CoV (%) 


iter 




iter 


K 


iter 


K 


iter 


K 


25 


719 


7378.4 


16 


3.2356 


7 


1.1761 


7 


1.1776 


50 


1039 


16,014.8 


29 


9.3553 


11 


1.7685 


10 


1.7836 


75 


1511 


35,317.3 


46 


22.2147 


15 


2.8198 


13 


2.8454 


100 


2016 


70,143.6 


66 


43.6525 


19 


4.2935 


16 


4.1669 


125 


2591 


116,678.0 


85 


72.7584 


23 


5.9776 


19 


5.5362 


150 


3209 


178,890.0 


103 


107.0670 


26 


7.7459 


21 


6.8507 



Table IX. Convergence of (flexible) conjugate gradients for the full system matrix A obtained by the gPC 
expansion of the lognormal field, for A preconditioned by the mean-based preconditioner Mm, by the block 
Gauss-Seidel preconditioner AfbcS' ^nd by the hierarchical Schur complement preconditioner Afjjs- Here, 
the stochastic dimension is A'^ = 4, the polynomial degree is P — A, the coefficient of variation of the 
lognormal random field is CoV = 100%, and the size of the finite element mesh h is variable. The other 

headings are same as in Table I. 



setup 


A 


MmA 


MbGS^ 


MmA 


h ndof 


iter K 


iter K 


iter K 


iter K 


1/5 2520 
1/10 8470 
1/15 17,920 
1/20 30,870 
1/25 47,320 
1/30 67,270 


831 17,695.3 
2016 70,143.6 
3377 158,334.0 
4395 275,686.0 
5600 429,551.0 
7180 626,475.0 


59 40.6232 
66 43.6525 

68 44.4170 

69 44.8882 
69 44.9413 
71 45.1100 


18 3.9885 

19 4.2935 
19 4.3764 

19 4.3742 

20 4.3986 
19 4.3732 


15 3.8361 

16 4.1669 

16 4.2394 

17 4.2510 
17 4.2592 
17 4.2630 
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7. CONCLUSION 

We have presented a hierarchical Schur complement preconditioner for the iterative solution of 
the systems of linear algebraic equations obtained from the stochastic Galerkin finite element 
discretizations. The preconditioner takes an advantage of the recursive hierarchical two-by-two 
structure of the global matrix, with one of the submatrices block diagonal. We have compared its 
convergence using (flexible) conjugate gradients without any preconditioner, with the mean-based 
preconditioner which requires one block diagonal solve per iteration, and with the block version 
of the well-known symmetric Gauss-Seidel method used as a preconditioner. The algorithm of our 
preconditioner consists of a loop of diagonal block solves and a multiplication by the upper block 
triangle in the pre-correction loop, and of another loop of diagonal block solves and a multiplication 
by the lower block triangle in the post-correction loop. The loops are sequential throughout the 
hierarchy of the global matrix, but the block solves are independent within each level. We have 
also succesfully tested the preconditioner in the case of the random coefficient with lognormal 
distribution. However, in this case the algorithm involves solves (either direct or of preconditioned 
inner iterations) with larger submatrices than just the diagonal blocks, and a direct comparison to 
the symmetric block Gauss-Seidel preconditioner in terms of work count is not straightforward. 

In conclusion, our algorithm appears to be more effective in terms of iterations and work count 
compared to the block version of the symmetric Gauss-Seidel method. Our method also allows for 
the same degree of parallelism as the Gauss-Seidel method, since both involve solving the block 
diagonal matrices Dg. It is important to note that the discussed preconditioners in general rely only 
on (block-by-block) matrix- vector multiplies, and their performance will also depend on the choice 
of preconditioner AIq for the solves with the diagonal blocks. Clearly, one can use such solver for 
each one of the diagonal blocks that might introduce another level of parallelism, e.g., similarly as 
recently proposed in [34, 35, 36]. However such extensions will be studied elsewhere. 
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